Assignment 5¶

Author: Mukai (Tom Notch) Yu

Email: mukaiy@andrew.cmu.edu

Q1. Classification Model (40 points)¶

Accuracy: 0.9790136411332634

success_chair success_lamp success_vase

fail chair fail lamp fail vase

The above failure could be due to missing/mistaken stereotypical global features, e.g. chair should have horizontal board

Q2. Segmentation Model (40 points)¶

Accuracy: 0.9031894651539708

Best Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 0.9967

gt pred

Sample Accuracy: 0.9961

gt pred

Sample Accuracy: 0.9939

Worst Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 0.4625

gt pred

Sample Accuracy: 0.4703

The first failure case might be due to our current network's receptive field of either single or all points, and the second must be due to geometrically mixed labels

Q3. Robustness Analysis (20 points)¶

Random Rotation¶

Apply different random rotation for each test sample

Classification¶

Accuracy: 0.3179433368310598

success_chair success_lamp success_vase

fail chair fail lamp fail vase

Pretty bad result. When not trained with random rotation, gravity becomes a negative cue for the task

Segmentation¶

Accuracy: 0.3485492706645057

Best Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 0.9328

gt pred

Sample Accuracy: 0.9076

gt pred

Sample Accuracy: 0.8885

Worst Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 0.0128

gt pred

Sample Accuracy: 0.0514

Pretty bad result. Again, not trained with random rotation, then gravity becomes negative cue

Less number of points¶

Reduce to 100 per sample

Classification¶

Accuracy: 0.9307450157397692

success_chair success_lamp success_vase

fail chair fail lamp fail vase

Result is pretty good, because we have global max, as long as 1 point's deciding feature channel stands out, the class can be determined

Segmentation¶

Accuracy: 0.8055591572123176

Best Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 1.0000

gt pred

Sample Accuracy: 1.0000

gt pred

Sample Accuracy: 1.0000

Worst Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 0.2800

gt pred

Sample Accuracy: 0.3200

Also not dropping too much, because the receptive field is either 1 single point or all the points, as long as 1/100 point's channel stands out in that channel for all the points, it can provide holistic understanding for the entire point cloud

Q4. Bonus Question - Locality (20 points)¶

Implemented Point Transformer, set nearest neighbor to 8, trained with randomly rotated 100 points

Segmentation did not adopt the UNet architecture, just plain 4 layers of transformer without down/upsampling

Classification¶

Accuracy: 0.8719832109129066

success_chair success_lamp success_vase

fail chair fail lamp fail vase

Result is pretty good compared to previous random rotation evaluation, because we now have local receptive field and allow the feature to propagate to nearby neighbors

Segmentation¶

Accuracy: 0.6960291734197731

Best Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 1.0000

gt pred

Sample Accuracy: 0.9900

gt pred

Sample Accuracy: 0.9900

Worst Segmentation Ground Truth (left) vs Prediction (right):

gt pred

Sample Accuracy: 0.0500

gt pred

Sample Accuracy: 0.0900

Also pretty good compared to previous random rotation evaluation, because now we have local receptive fields